概述与架构演进图景

我们从AlexNet的基础性成功，过渡到超深卷积神经网络（CNN）的时代。这一转变要求深刻的架构创新，以应对极端深度带来的挑战，同时保持训练的稳定性。我们将分析三种具有里程碑意义的架构——VGG、 GoogLeNet（Inception）以及 ResNet——理解它们如何分别解决扩展性问题的不同方面，为本课后续内容中对模型可解释性的严格分析奠定基础。

1. 结构简洁性：VGG

VGG引入了通过使用极其一致且微小的核尺寸（仅限于 3×3卷积核堆叠）来最大化深度。尽管计算成本高昂，但其结构的一致性证明，通过极小的架构变化实现的原始深度是性能提升的主要驱动力，从而巩固了小感受野的重要性。

2. 计算效率：GoogLeNet（Inception）

GoogLeNet通过优先考虑效率和多尺度特征提取，有效应对了VGG的高计算成本。其核心创新是 Inception模块，它并行执行卷积（1×1、3×3、5×5）和池化操作。关键在于，它利用 1×1卷积作为瓶颈，在昂贵的操作之前大幅减少参数量和计算复杂度。

核心工程挑战

Residual Learning: ResNet

ResNet solved the degradation problem by introducing the identity mapping (skip connection). This non-sequential shortcut allows the network to learn a residual function $F(x)$ instead of a direct mapping $H(x)$, effectively ensuring that adding more layers can only improve or maintain performance, dramatically improving optimization stability.

Diagram showing a ResNet skip connection architecture

Question 1

Which architecture emphasized structural uniformity using mostly 3x3 filters to maximize depth?

AlexNet

VGG

GoogLeNet

ResNet

Question 2

The 1x1 convolution is primarily used in the Inception Module for what fundamental purpose?

Increasing feature map resolution

Non-linear activation

Dimensionality reduction (bottleneck)

Spatial attention

Critical Challenge: Vanishing Gradients

Engineering Solutions for Optimization

Explain how ResNet’s identity mapping fundamentally addresses the Vanishing Gradient problem beyond techniques like improved weight initialization or Batch Normalization.

Describe the mechanism by which the skip connection stabilizes gradient flow during backpropagation.

Solution:
The skip connection introduces an identity term ($+x$) into the output, creating an additive term in the derivative path ($\frac{\partial Loss}{\partial H} = \frac{\partial Loss}{\partial F} + 1$). This term ensures a direct path for the gradient signal to flow backwards, guaranteeing that the upstream weights receive a non-zero, usable gradient signal, regardless of how small the gradients through the residual function $F(x)$ become.